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ABSTRACT 

One  of  the  most  crucial  problems  in  theoretical  and  applied  statistics  is  to 
determine  the  precision  of  the  estimates  produced  by  different  statistical  estimators. 
This  problem  is  greatly  increased  when  the  population  parametric  characteristics  are 
not  known.  Parallel  to  this  problem  is  that  of  deciding  how  large  (or  small)  the  sample 
population  must  be  in  order  to  obtain  a  desired  precision  within  certain  range. 

There  are  several  non-parametric  methods  to  approach  the  first  problem.  The 
BOOTSTRAP  Method  (Efron,  1979)  is  one  of  these  approaches  and  the  one  of  interest 
in  this  thesis.  With  this  method,  one  could  improve  the  precision  of  the  estimates  and 
gain  information  about  the  distributional  characteristics  of  statistical  estimators.  The 
bootstrap  method  has  been  amply  compared  with  other  methods;  the  results  show  that 
the  bootstrap  method  often  produces  more  precise  estimates  (i.e.  with  smaller  mean 
squared  error)  than  competitors  such  as  the  JACKNIFE,  SECTIONING  and 
CROSS-VALIDATION.  However,  the  results  that  have  been  obtained  are  based  on 
large  sample  sizes  and  large  numbers  of  "bootstrap"  replications. 

This  thesis  analyzes  the  behavior  of  the  BOOTSTRAP  method  when  the  number 
of  bootstrap  replications  is  small.  It  tries  to  identify  any  tradeoffs  between  sample  size 
and  the  number  of  bootstrap  replications  required  to  attain  a  desired  precision  in  the 
estimates  produced  in  several  particular  situations.  One  of  the  goals  is  to  produce 
graphical  displays  that  will  indicate  to  the  experimental  statistician  the  price  that  must 
be  paid  in  the  precision  of  the  estimates,  obtained  with  the  bootstrap  method,  when 
sample  size  is  small,  and  the  number  of  bootstrap  replications  to  use  in  this  situation. 
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I.  INTRODUCTION 

A.      BACKGROUND 

One  of  the  most  common  problem  in  applied  statistics  is  the  estimation  of  an 
unknown  parameter  0.  Once  the  statistician  has  decided  on  the  model  having  one  or 
more  parameters  to  be  estimated  and  has  selected  the  estimator  (i.e.,  m.l.e.,  least-square 
estimator,  etc.)  that  will  be  used  to  obtain  the  estimates,  the  second  problem  that  he  or 
she  faces  is  how  to  estimate  the  accuracy  of  these  estimates.  There  are  several  ways  of 
measuring  the  accuracy  or  the  error  of  statistical  estimators.  In  this  thesis,  the  measure 
of  statistical  error  will  be  defined  to  be  the  mean  squared  error  (MSE)  of  the 
estimators;  i.e.  the  variance  plus  the  bias-squared  of  6*'  (where  0"^  represents  the 
estimator  of  the  parameter  0.  In  Appendix  A  the  reader  will  find  a  list  of  special 
notations  used  in  this  thesis)  : 

MSE(0'^)  =  E[(0'^  -  0)2]  =  Var(0'')  +  [BIAS(0'')]2  (1.1) 

When  the  practitioner  is  dealing  with  samples  obtained  from  populations  for 
which  the  distributional  characteristics  are  known,  classical  statistical  theory  provides 
an  answer  to  the  second  problem  that  the  statistician  faces.  This  is  true  since,  at  least 
in  theory,  the  variance  and  the  bias  of  most  statistical  estimators  can  be  calculated 
analytically.  However,  the  difficulty  of  analytically  deriving  the  MSE  of  some  statistical 
estimator  increases  as  the  mathematical  definition  of  the  estimator  becomes  more 
complicated.  When  this  is  the  case  or  when  the  practitioner  does  not  actually  know  the 
probability  distribution,  say  F,  from  which  the  sample  was  obtained,  then  the  MSE  of 
the  estimators  must  be  estimated. 

There  are  several  non-parametric  methods  for  estimating  the  bias  and  the 
variance  of  an  estimator  of  interest.  The  most  common  ones  are  the  Quenoille-Tukey 
JACKNIFE  method,  CROSS-VALIDATION,  and  SECTIONING;  the  Jacknife  being 
the  most  commonly  used  of  the  three  approaches.  Efron  and  Gong  [Ref  1]  and  Miller 
[Ref.  2]  provide  an  excellent  exposition  of  the  first  two  methods  and  Lewis  gives  a  good 
introduction  and  analysis  of  the  later  (See  [Ref  3] ). 


.* 


and  then  X  ['^[[(^  F*"  .  Then  the  task  is  to  estimate  the  distribution  of  0(F)  by  the 
distribution  of  0' (F*'),  where  0'"(F'')  denotes  the  value  of  the  parameter  of  interest 
based  on  the  bootstrap  mechanism.  This  mechanism  proceeds  as  follows  :  keeping  F'' 
fixed,  draw  a  bootstrap  sample  and  calculate  0  (F'');  do  this  a  large  number  B  of  times 
obtaining  0*i(F''),  0*2(F''),  .  .  .  ,  0*g(F'').  The  resultant  (sample)  distribution  of  0'*"  is 
called  the  bootstrap  distribution  F^    .    Once  F*^    is  obtained,  then  any  specific  feature 

ik  ^  ic 

of  this  distribution,  such  as  expected  value  of  0     ,  E*(0  )  or  the  variance  of  0     , 

Var*  (0    ),  could  be  obtained.  (In  this  thesis,  notation  like  "E*  ",  "Var*  ",  "S  ^  ",  "X 

* 
,  etc.,  indicates  calculations  relating  to  the  conditional  bootstrap  distribution  of  X    ,  with 

the  vector  of  random  variates  X  and  hence  F''  ,  fixed.-^  ).  Theoretically,  then,  the 

bootstrap  idea  could  be  used  to  estimate  the  expected  value,  the  variance,  and  the 

mean  squared  error  of  any  estimator,  given  a  sample  that  comes  from  an  unknown 

probability  distribution  F. 

As  mentioned  earlier,  Efron  (See  [Ref  4]  )  has  shown  that  this  method  is  often 
more  precise  than  other  non-parametric  methods  for  assessing  statistical  accuracy. 
However,  the  experimentation  done  in  the  past  using  this  method  relied  on  a  large 
number  B  of  bootstrap  replications;  i.e,  a  large  sample  on  0  .  In  some  cases,  it  can  be 
shown  (see  Chapter  2,  for  the  case  of  Var*(0  ))  that  as  B  -♦co,  the  variance  of  0 
based  on  F"^  is  equal  to  the  variance  of  the  estimator  0  based  on  F  .  But,  how  large 
must  B  be  in  order  to  obtain  estimates  that  are  accurate  or  to  obtain  estimators  with  a 
small  MSE  is  a  question  to  be  answered.  Also,  what  is  the  tradeoff  between  the 
sample  size  n  and  the  number  B  of  bootstrap  replications  ? 

The  purpose  of  this  thesis  is  then  twofold  :  first,  to  analyze  the  bootstrap 
performance  as  the  number  B  of  replications  increases,  starting  from  a  small  B.  The 
second,  also  of  great  interest,  is  to  study  the  relationship  between  the  sample  size  n  and 
the  number  B  in  the  estimation  of  the  MSE  of  the  estimator  using  the  bootstrap 
mechanism. 

C.       ORGANIZATION 

There  are  several  methods  of  dertermining  the  bootstrap  distribution  of  an 
estimator  0  (F'^),  two  of  which  will  be  analyzed  in  this  thesis.^  The  first  is  by  direct 


_:As  It  will  be  shown  in  the  next  chapter,  this  is  a  critical  feature  of  the 
BOOTSTRAP  method:  the  vector  of  random  variates  X  and  F  must  be  fixed  through 
the  process. 

A    third    method    involves    making    Taylor    series    expansion    to    obtain    the 

10 


theoretical  calculations  (this  is  usually  the  most  difficult  approach).  The  second  relies 
on  Monte  Carlo  approximations  to  the  bootstrap  distribution:  repeated  realizations  of 

*  *1  *2 

X  are  generated  by  taking  random  samples  of  size  n  from  F  ,  say  jc  ,  x  ,  .  .  .  , 
X  and  the  histogram  of  the  corresponding  values  0  j(F")  ,  0  2^^  )  »  •  •  •  .  G  3(F)  is 
constructed  as  an  approximation  to  the  actual  bootstrap  distribution  (See  [Ref  1: 
Section  2]  ).  These  two  methods  are  of  interest  in  the  second  chapter.  In  the  last 
section  of  Chapter  Two,  the  different  statistical  experiments  conducted  for  this  thesis 
are  explained  in  detail.  In  Chapter  Three,  the  results  from  these  experiments  are 
presented  and  analyzed,  and  the  problem  of  using  the  bootstrap  approach  in  linear 
regression  problems  is  also  discussed.  Conclusions  are  presented  in  the  last  chapter. 
There,  one  of  the  points  of  interest  is  to  discuss  the  main  disadvantage  of  the  bootstrap 
methodology  :  the  computer  time  required  to  implement  this  method  when  Monte 
Carlo  simulation  is  used.  In  Appendix  B,  the  FORTRAN  software  that  was  designed 
to  run  the  experiments  discussed  in  this  thesis  will  be  explained  and  the  code  is  listed. 
This  computer  program  is  user  friendly  and  can  be  used  to  estimate  the  bootstrap 
distribution  of  eight  different  estimators.  Finally  in  Appendix  C,  the  reader  can  see 
some  tables  that  give  a  good  idea  about  how  large  (or  small)  B  and  n  can  be  in  order 
to  obtain  a  desired  precision  on  the  estimates  of  parameters  of  given  populations  F. 


* 
approximate  mean  and  variance  of  the  bootstrap  distribution  F    ,  See  Ref  4,  Section  5, 

11 


1.    Direct  Analytical  Calculations 

An  attempt  is  now  made  to  calculate  some  parameters  of  interest  of  the 
distribution  of  X  j.  Assuming  the  conditions  shown  in  expressions  (2.1)  and  (2.2),  the 
expected  value  of  X  j  ,  given  X,  could  be  calculated  as  follows  : 

E*(X*j )  =  E{X*-  I  X  =  X)  =  J;j  X-   P(X*j  =  Xj  I  X  =  X)  ,  (2.3) 

where  j  =  1,  2,...,  n.  From  (2.2),  this  is  equal  to  : 

E*(X*j)=    Ij(Xj/n)=X  .        j=l,2,...,n,  (2.4) 

which  is  the  sample  mean  of  the  original  sample  X.  Then  from  (2.4),  the  unconditional 

* 
expected  /alue  of  X  •  is  : 

E  (X*p    ^  EIE*(X*j  1  X)]  =  E(X)  =  Hx  j=  1,  2....,  n  .  (2.5) 

* 
Thus,  the  unconditional  expectation  of  X  :  is  equal  to  the  mean  of  the  population 

from  which  the  original  sample  was  obtained.  (Note,  from  this  point  on  all  summation 

signs  go  from  1  to  n,  unless  otherwise  specified,  and  E*  ,  Var*  ,  etc.,  are  conditional, 

give  X  .) 

Likewise,    the    unconditional    variance    of  X      could   be    derived    from   the 

conditional  variance  of  X    : 

Var*(X*j )  =  E*[(X*i  -  E{X*-  |  X  =  x))^  ]  .  (2.6) 

Using  (2.5)  this  expression  is  equivalent  to  : 

Var*(X*i )  =  E[(X*-  -Xf\X]  (2.7) 

=  E*(X*2.  )  -  X^ 

=  Ei  (A  /  ^)  -  X' 

=  Zi  (Xi  -  X)2  /  n 
By  definition  of  the  sample  variance,  S^^^  ,  then 

Var*(X*-)  =  (n-l)/nS2j^  (2.8) 
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Now,  unconditionally 

Var  (X*-)  =  E(Var*(X*))  +  Var[E*(X*)]  (2.9) 

=  E  llj  iX\  /  n  )  -  X^  ]  +  Var(X) 

=  E[(n-l)/nS2j  +  (T^^/n 

=  (n-l)/nE(S\)  +  a\/n 

=  (n-l)/n  (T^j^  +  cryn 

Therefore,  the  variance  (unconditional)  of  X  -is  the  same  as  the  variance  of 

*  * 

X-.     The   covanance   between  X  •    and  X  ;   has   a   very  important   impact   on   the 

bootstrap    methodology,    primarily   when    the    bootstrap    distribution    of  0  •(F*')    is 

approximated  by  Monte  Carlo  simulation  (see  next  section). 

Conditionally  (given  X),  the  covariance  between  X  •  and  X  ;  is  as  follows  : 

Cov*(X*j,X*j  )  =  E*[(X*j  -  E*(X*-))  (X*j  -  E*(X*-))  ]  .  (2.10) 

From  (2.5),  this  is 


*  *  ,  ,  *  ■",  ,  *  "^v 


Cov*(X  i,X  •  )  =  E*[(X  i  -  X)  (X  p  X)  ]  (2.11) 

=  E*(X*-   X*-  )  -  X2 

•'  *      * 

Now  conditionally,  given  X  =  x,  the  joint  distribution  of  (X  j,X  :)  is  uniform  over  the 

*  ■'    * 
points  (jCj^,^:^,...,  x^)  x    {x^,x^,...,  x^)  and  this  implies  that  (X  •  X  ;)   =   (xj^Xj)  with 

probability  1/n^.  Then 

E*(X*j  X*j)  =  Y.-J1]  (Xi  Xj  )  /  n^  i  *  j  (2.12) 

=  (l/n^)(ZiXi)'  =  x^ 
Finally,  the  conditional  covariance  between  X  ^  and  X  ;  is 

Cov*(X*-,X*j)  =  X^  -  X^  =  0  .  (2.13) 

Now,  to  derive  the  unconditional  covariance  between  X  j  and  X  :,  it  will  be  convenient 
to  use  the  result  obtained  in  equation  (2.13).  To  use  (2.13),  it  must  be  shown  that  the 
following  equality  holds: 
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Cov(X*i,X*p=E[Cov*(X*i,X*p]  +  Cov[E*(X*i),  E*(X*j)]:        _  (2.14) 

To  show  this,  notice  that  the  conditional  covariance  can  be  defined  as 

Cov(X,Y|Z)  =  E(^^y|2)[(XY  -  E(X|Z)E(Y|Z))|Z]  (2:15) 

=  E(x^y,2)  (XY|Z)  -  [E(XiZ)E(Y|Z)]  . 
Then 


E2[Cov(X,Y|Z)]  =  E2[E(x^y|2)(XY|Z)  -  {E(X|Z)E(Y1Z)}]  (2.16) 

=  EJE(x^y|2)(XYlZ)]  -  {E2lE(X|Z)]E2[E(YlZ)]}  - 
-  EJE(X|Z)E(Y|Z)]  +  {E2[E(X|Z)]E2[E(Y1Z)]} 
=  Cov(X,Y)  -  Cov[E(X|Z),E(Y|Z)]. 
Therefore, 


Cov(X,Y)  =  E2[Cov(X,Y|Z)]  +  Cov[E(XlZ),E(Y|Z)].  (2.17) 

With  this  in  mind,  the  unconditional  covariance  could  finally  be  computed  by  using 
(2.15).  Now,  the  portion  inside  the  brackets  of  the  first  term  of  the  right  hand  side  of 
equation  (2.14)  was  shown  in  (2.13)  to  be  equal  to  zero.  Then,  using  expression  (2.5), 
equation  (2.14)  reduces  to 

Cov(X*-,X*:)  =  Cov(X,X)  =  Var(X)  =ayn  ,  (2.18) 

and  from  (2.18),  the  correlation  coefficient  is  given  by 

p(X*i,X*-)  =  1/n  =  P[X*-  =  Xj]  (2.19) 


Comparing   equations   (2.13)  and  (2.18)   it   could   then   be   stated   that   the 
bootstrap  samples  are  (conditionally)  independent  as  long  as  X  is  held  fixed. 

It  is  possible  now  to  derive  the  distributional  characteristics  of  some  statistical 

* 
estimators  based  on  the  distribution  of  X  •.  In  doing  this,  it  is  assumed  that  the 

original  sample  X  is  fixed  and  these  derivations  are  conditional.    For  example,  the 

expected  value  and  the  variance  of  X    (the  bootstraped  sample  mean)  are  obtained  as 

follows:   using  equation  (2.5) 
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E*(X*)  =  X   ,  (2.20) 


so  unconditionally,  the  expected  value  of  the  bootstrap  sample  mean  is 

E(X*)  =  E(X)  =  n^  .  (2.21) 

The  conditional  variance  of  the  bootstrap  sample  mean  is 

Var*(X*)  =  (l/n2)Var*  [  Y[  (X*i)]  (2.22) 


=  (l/n')  Ei  Var*(X*i)  +  (n(n-l)/2)Cov*(.X*i,X*-)] 


From  equation  (2.13),  the  conditional  variance  is  then 

Var*(X*)  =  (l/n2)EiVar*(X*i)  ]  (2.23) 

=  (l/n2)[nVar*(X*i)]   . 
Using  equation  (2.8),  finally 

Var.*(X*)=  (n-l)/n2  S^^^ .  (2.24) 

With  this  expression,  the  unconditional  variance  of  X    is  given  by 

Var(X*)  =  E[Var*(X*)]  +  Var[E*(X*)]  .  (2.25) 


From  equation  (2.5),  and  (2.20) 

Var(X*)=  E[(n-l)/n2  S^J  +  Var(X) 

=  (n-l)/n2  <j\  +  (j\ln 

=  (2n-l)/n  Var(X) 

As  mentioned  earlier,  equation  (2.24)  is  the  one  of  interest  when  one  wants  to  apply 

—  * 
the  bootstrap  mechanism  to  obtain  the  variance  of  X  .   Notice  that  as  n  -►  oo  , 

Var*(X*)  -♦  Var(X)  (2.26) 

strongly  (strong  law  of  large  numbers),  but  this  is  not  the  case  for  the  unconditional 

—  * 
variance  of  X    ,  where  as  n  -»  oo, 
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—  * 


Var  (X' )  -♦  2Var(X)  .  (2.27) 

It  is  now  possible  to  define  an  estimator  for  the  MSE  of  the  mean  of  a 
* 
population  based  on  X  : 

MSE*(X*)  =  Var*(X*)  +  [E*(X*  -  E*(X*)f  (2.28) 

=  Var*(X*)  +  [Bias*(X*)]2 

In  the  same  manner,  the  MSE  of  any  estimator  could  be  derived.  However,  it 
is  easy  to  see  that  as  the  mathematical  defmition  of  the  estimator  gets  more 
complicated,  this  procedure  can  become  very  tedious.  This  is  why  it  is  desired  to 
estimate  the  bootstrap  distribution  of  the  estimator  by  simulation  rather  than 
analytically. 

2.    Monte  Carlo  Simulation 

The  algorithm  presented  in  Chapter  II,  Section  A,  could  be  expanded  to  allow 
Monte  Carlo  simulation  to  approximate  the  bootstrap  distribution  of  6  (F**).  As  before 
(See  Efron  [Ref  2:  Section  2]  ): 

(1)  given  that  the  realization  of  the  random  vector  X  has  been  observed,  say  X- 
=  Xj  for  i=  1,  2,...,  n  ; 

(2)  construct  the  sample  probability  distribution  P'^  ,  by  giving  a  mass  1/n  at  each 
point  x^  ,  Xg  ,  .  .  .  ,  Xj^  , 

(3)  keeping    x-  (and  thus,  F*^  )  fixed,  draw  with  replacement  a  random  sample  of 
size  n  from  F''  ,  and  call  this  a  bootstrap  sample; 

(4)  from  this  random  sample,  compute  the  bootstrap  replication,  9-  (F  );  i.e, 
compute  the  value  of  the  desire  statistic  based  on  the  sample  from  F*^  .  Then, 

(5)  do  steps  (3)  and  (4)  a  "large"  number  B  of  times.  In  this  way  one  obtains 
independent  bootstrap  replications  of  e*(F''),  say  e*j(F''),  e*2(F''),...,  e*g(F'') 

J 

(6)  now,  approximate  the  variance  of  0  (F'')  by  the  sample  variance 

Var*"^  10*(F^]  =  Y.{  [e*i(F^  -  e*(F»')]2  /  (  B  -  1  )  .  (2.29) 

where  i=  1,  2,...,  B,  and 

e*(F'')  =  Y.\  e*i(F')  /  B   .  (2.30) 
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The    MSE  of  0  (F")  may  be  estimated  by 


h/Q*/'rh^^   =    Vo.-.hrQ*/'TrhM     i     rDTAC    hrQ*/'t7hM2 


MSE-^c^Ce  (F"))  =  Var*''[e  (F'^)]  +  [BIAS-hc^G  (F'')]^  (2.31) 

It  will  be  seen  in  Chapter  Three  that  as  B  and  n  get  large  MSE*  (0  (F  ))  approaches 
zero.  A  problem  in  using  the  bootstrap  is  the  choice  of  B,  and  we  consider  this  in 
Chapter  Three. 

This  bootstrap  simulation  procedure  was  carried  out  to  study  the  effect  of 
possible  choices  of  B,  in  terms  of  the  estimated  MSE  of  several  estimators.  The  reader 
will  see,  in  the  next  chapter,  that  the  choice  of  B  should  depend  on  the  sample  size  n, 
the  specific  estimator  under  consideration  and  the  structure  of  the  population  from 
which  the  sample  was  obtained. 

a.   The  Statistical  Experiment 

In  this  thesis,  various  experiments  were  conducted  to  study  the  problem 
of  selecting  B.  The  main  idea  behind  these  experiments  was  to  select  some  well  known 
probability  distributions  and  some  parametric  estimators  for  which  the  distributional 
characteristics  are  well  known.  Then  the  MSE  of  these  estimators  could  be  determined 
theoretically.  Therefore,  one  could  compare  this  true  MSE  with  the  estimated  MSE  of 
the  estimators  obtained  using  the  bootstrap  mechanism. 

The  critical  part  of  the  experiment  was  to  design  an  effective  computer 
code  to  perform  the  Monte  Carlo  simulation.  The  FORTRAN  program  developed  to 
carry  out  the  simulation  reported  here  is  listed  in  Appendix  B.  This  program  was  used 
to  analyze  the  performance  of  eight  different  estimators  based  on  the  bootstrap 
methodology.  These  were  the  sample  mean,  variance  (three  different  estimators), 
coefficient  of  correlation,  coefficient  of  variation,  the  five-percent  trimmed  mean,  and 
the  median. 

The  simulation  runs  as  follows  (See  Appendix  B): 

(1)  n  random  variates,  for  up  to  8  values  of  n,  are  first  generated  representing  a 
random  sample  from  a  population  F.  (  In  the  simulation  a  total  of  N  random 
variables  are  first  generated,  then  sectioned  into  samples  of  sizes  n^  where  i  = 
1,2,  ...,  8.) 

(2)  For  each  subsample  of  size  n  ,  a  bootstrap  function  is  called  to  generate  a 
bootstrap  sample  from  the  original  sample.    Then,  the  estimator  function  is 


called  to  produce  a  desired  estimate.  This  step  is  repeated  until  B  bootstrap 
samples  from  the  original  sample  are  obtained. 

(3)  After  the  B  estimates  have  been  obtained,  the  statistics  function  is  called  to 
calculate  the  mean  of  these  estimates,  this  number  is  one  of  the  9  j(F"). 

(4)  In  order  to  improve  the  precision  of  the  simulation  process,  steps  (2)  and  (3) 
are  replicated  M  times.  Then,  the  process  will  produce  a  total  of  (N  x  M)/  n 
estimates.  From  these  estimates,  a  box-plot  is  constructed  and  estimates, 
including  MSE,  are  calculated. 

In  the  next  chapter  some  of  the  results  obtained  from  this  simulation 
process  are  analyzed. 
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III.  APPLICATION  OF  THE  BOOTSTRAP  METHOD  :  SOME  RESULTS 

A.      THE  MEAN,  VARIANCE  AND  THE  COEFFICIENT  OF  VARIATION  OF 
EXPONENTIAL  RANDOM  VARIATES 

The  first  experiment  conducted  was  intended  to  analyze  the  bootstrap  mechanism 

in  estimating  the  MSE  of  the  estimators  for  the  mean,  variance  and  coefficient  of 

variation  of  a  sample  coming  from  a  population  of  exponential  random  variates  with 

parameter  X  =  1.   The  population  coefficient  of  variation  is  defined  as: 

CV(X)  =  G^/}i^  (3.1) 

In  the  Exponential(l)  case,  the  mean,  variance  and  the  coefficient  of  variation  have  the 
same  value  of  1.  With  this  first  fact  in  mind,  the  MSE  of  sample  mean,  as  an  example, 
is  defined  using  (2.21)  and  (2.28)  as: 

MSE(X*)  =  Var(X*)  +  [E(X*  -  ji^)]  ^  .  (3.2) 

Conditionally,  from  (2.26),  an  estimate  of  (3.2)  is: 

MSE*^?^*)  =  [(n-i)/n2  S^^^  j  +  [e*(X*  -  l)]^  .  (3.3) 


In  the  same  manner,  the  MSE  for  the  variance  and  coefficient  of  variation  could  be 
estimated.  These  estimates  were  obtained  using  the  algorithm  described  in  the 
preceding  section.  The  sample  sizes  for  this  experiment  were:  n  =  10,  20,  25,  40,  50,  70 
,100,  140.  Each  estimator  was  bootstraped  using  B  =  5,  8,  10,  15,  20,  25,  40,  60,  100, 
140,  and  500.  Figures  3.1,  3.2  and  3.3  below,  show  how  the  MSE***  for  the  mean, 
variance  and  coefficient  of  variation  respectively  decreases  as  both  n  and  B  increases. 

A  remarkable  feature  of  these  plots  is  that  the  MSE**^  of  the  bootstrap  sample 
variance  (Figure  3.2)  decreases  much  faster  as  the  sample  size  increases  than  when  B 
increases.  Observe  the  big  jump  in  the  MSE**^  when  n  goes  from  10  to  40  relative  to 
that  of  B  going  from  5  to,  say,  40:  the  jump  is  much  greater  in  the  former. 

Another  observation  of  interest  is  that  the  MSE*''  of  the  estimates  decreases  as 
B  increases,  but  beyond  a  certain  threshold  very  slowly.  Indeed,  the  decrease  in  MSE*'' 
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MSE*''   of  Bootstrap  Sample  Variance:  Exp(l). 
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Figure  3.3     MSE.*'*   of  Bootstrap  CoefT.  of  Variation:  Exp(l). 

beyond  B  ^  50  is  barely  noticeable.  For  example,  see  Figure  3.2,  the  MSE**'  of  the 
sample  variance  decreases  only  by  one-thousandth  of  a  unit  when  B  is  increased  from 
200  to  500  replications.  This  is  also  true  for  the  sample  mean.  However,  for  the 
coefTicient  of  variation  (see  Figure  3.3),  the  MSE***  improved  about  two  percent  (.02) 
in  the  same  range  for  a  small  sample  size  (n=  10).  These  results  give  an  idea  of  the 
performance  of  the  MSE  of  the  bootstrap  estimates  of  a  given  estimator.  It  should  also 
suggest  to  the  statistician  that  once  the  estimators  are  performing /a/V/y  well  (i.e.,  once 
this  threshold  has  been  attained),  there  is  no  reason  to  increase  the  amount  of 
bootstrap  replications,  since  this  will  not  induce  a  great  improvement  in  the  estimates. 
An  important  point  here  is  that  when  an  attempt  is  made  to  estimate  the  sample 
variance  using  the  bootstrap  method,  the  number  of  bootstrap  replications  should  be 
greater  than  100  in  order  to  decrease  the  MSE*"^   below  0.6. 

The  bootstrap  distribution  of  some  of  the  estimators  are  shown  in  Figures  3.4, 
and  3.5  in  the  form  of  boxplots  and  a  summary  of  the  distributional  statistics.  These 
were  obtained  by  using  a  statistical  package,  called  SMTBIO,  developed  at  NPGS  (See 
Appendix  B).  This  package  was  modified  by  the  author  of  this  thesis  in  order  to  obtain 
MSE***.  Each  boxplot  represents  the  distribution  of  the  bootstrap  estimator  based  on 
the  sample  size  n. 
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Figure  3.4    Bootstrap  Dist.  of  Sample  Mean    B  =  5. 
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Notice,  in  Figure  3.4,  that  the  distribution  of  the  bootstrap  sample  mean 
resembles  a  Normalj  as  would  be  expected  by  the  Central  Limit  Theorem,  with  the 

Kurtosis  and  Skewness  oscillating  around  zero,  as  n  increases.  Recall  from  previous 

* 
section  that  the  standard  deviation  of  X  ,  in  the  case  of  Figure  3.4,  would  be  estimated 

by 

STD*''(X*)    =  SJD*^I  Vn*,  n*  =  N  x  M/NE(I) 

and  STD*"^  is  the  value  shown  on  the  bottom  table  of  this  figure.  Figure  3.5  shows  the 
distribution  of  the  bootstrap  sample  variance  (3.5).  Looking  at  the  distribution 
summary,  one  can  say  that  this  distribution  is  quite  similar  to  that  of  a  scaled 
Gamma(k,P)  distribution.  Again  as  n,  increases- the  Kurtosis  and  Skewness  get  closer  to 
that  of  the  Gamma,  say  6/k,  and  l/^/k  respectively.  Figure  B.4  and  B.5,  Appendix  B, 
show  the  distribution  of  the  same  estimators  when  B  =  150.  It  is  easy  to  see  that  the 
distributional  characteristics  for  the  estimators  follow  the  same  patterns  as  those 
discussed  above,  where  B  =  5.  The  only  difference  there  is  that,  as  expected,  the 
number  of  outUers  decreases  significantly  particularly  in  the  case  of  the  sample 
variance. 

B.       THE  SAMPLE  VARIANCE 

This  experiment  was  intended  to  further  study  the  behavior  of  the  bootstrap 
sample  variance  for  populations  with  various  distributions.  The  ones  discussed  in  this 
section  are  the  GAMMA{0.5,1),  NORMAL(0,1)  and  LAPLACE(0,1).  For  this 
experiment,  the  sample  size  where  n  =  5,  10,  20,  25,  30,  50,  60,  and  B=  5,  8,  10,  15, 
20,  25,  30,  35,  40,  50,  100,  and  500.  In  the  first  two  cases,  the  GAMMA  and 
NORMAL  distributions,  the  bootstrap  sample  variance  seems  to  approximate  the 
population  variance  fairly  well  when  n  >  50,  where  the  MSE*''  is  less  than  0.10. 
Figures  3.6,  3.7,  and  3.8  show  the  relation  between  B,  n,  and  the  MSE**^  of  the 
bootstrap  sample  variance  for  a  Gamma(0.5,l),  Normal(0,l),  and  Laplace(O.l) 
respectively. 

Notice  that  there  is  a  lot  of  random  variation  in  the  MSE*''  when  B  is  in  the 
range  5  <  B  <  50  for  n  ^  30,  and  for  B  <  25  when  30  <  n  :^  60.  This  random  noise 
extends  beyond  these  ranges  in  the  case  of  the  Gamma(0.5,l).  Notice  that  in  Figure 
3.6,  the  lines  for  the  MSE**^  of  the  sample  variance  when  n=  15,  and  20  are  above 
that  when  n=  10  for  B  <  300.  However,  when  B  =  500,  these  lines  lie  below  the  one 
corresponding  to  n=  10.  The  MSE***  for  n=  15,  and  20  is  actually  less  than  the  MSE*'' 
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Figure  3.5     Bootstrap  Dist.  of  Sample  Variance   B  =  5. 
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Figure  3.6    MSE*''  of  Bootstrap  Sample  Variance  of  a  G(0.5,l). 
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Figure  3.7     MSE*''  of  Bootstrap  Sample  Variance  of  a  N(0,1). 
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Figure  3.8     MSE*''  of  Bootstrap  Sample  Variance  of  a  L(0,1). 

for  n=  10  just  after  B  >  150.  In  this  experiment,  it  is  also  true  as  found  for  the 
Exponential(l),  that  MSE*''  decreases  faster  as  n  decreases  than  when  B  increases.  This 
was  also  the  result  in  the  case  of  the  Laplace(0,l).  However  (notice  the  scale  of  the 
MSE  in  this  case),  the  MSE*''  is  quite  high.  Figure  3.8  shows  that  for  a  sample  of  size 
n  ^  15,  the  MSE*''  >  1.0  even  when  B  is  as  large  as  500.  It  was  suspected  that 
probably  this  high  MSE*''  was  caused  by  the  mechanism  used  to  generate  Laplace 
random  variates.  The  first  method  used  in  this  experiment  takes  the  difference  of  two 
Exponential  (1)  variates.  The  second  method  generates  an  Exponential(l)  and  converts 
it  to  a  Negative-Exponential(l)  with  probability  .5  .  The  histograms,  using  different 
sample  sizes,  showed  that  the  first  algorithm  used  to  generate  Laplace  random  variates 
was  the  most  effective.  In  any  case,  the  point  here  is  that  for  the  ranges  of  n  and  B 
used  in  the  experiment,  the  MSE**'  of  the  sample  variance  for  a  Laplace(0,l)  never 
decreased  below  0.2.  This  was  not  the  case  for  the  other  distributions.  This  suggests 
that  the  performance  of  the  bootstrap  method  depends  on  the  distributional  properties 
of  the  population  in  question  as  well  as  the  estimator  under  consideration. 
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C.       THREE  DIFFERENT  ESTIMATORS  FOR  THE  VARIANCE 

In  Chapter  Two,  the  expected  value  and  the  variance  of  the  bootstrap  sample 
* 
mean  (X  )  were  derived.  In  this  section,  the  expected  value  of  the  bootstrap  sample 

variance,  call  this  ^S  ^   ,  is  calculated.  Let 


*      —  * 


,S"2    =  [j;.  (X'  i  -  X  fy  (n  -  1)  (3.4) 


*->        r.*. 


=  Ei  X.;^  .  nX'^2]  /  (n  -  1)  . 
Note  that 

E*(XV)  =  (l/nffiX^.  ■  (3.5) 

so  that 

E*(Ii  Xi*^)  =  li  X\  (3.6) 

Likewise  the  second  moment  of  X    is  given  by: 

E*(X*2)  =  (l/n^)^.  X>+  ^.5].  E(X*iX*p]  i  *  j  (3.7) 

*     *  - 

As  before,  (X  ;X  ;)  has  probability  (1/n  )  of  being  any  point  of  the  form  (Xj^Xj)  so 

from  (2.7) 

E*(X*i,X*p  =  (l/n^)  E[  Y.[  Xi'  +  liSj  X^X-  ]  (3.8) 

=  (l/n^)EiXi^^-Ii2:j(XiXj)/n^ 
Now 

IE  X*iX*j  =  (n(n-l)/n2)E.  X^^  +  Eilj  XiX-]  (3.9) 

=  ((n-l)/n2)(j;.  X/ 
=  n(n-l)X2 
Then  (3.7)  can  be  expressed  as 

E*(X*2)  =  (l/n2)E.  X^.  +  n(n-l)X2]  (3.10) 

Finally,  using  (3.6)  and  (3.9),  the  conditional  expected  value  of  j^S      is 
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—  *, 


E*(,S*2)  ^  (i/(n.l))E..(2:i  X-'  +  nX')  (3.11) 

=  l/(n-l)CiE*(xV)-  nE*(X*2)] 

=  l/(n-l)  {  Y[  Xi'  -  [(1/n)  (li  X-'  +  n((n-l))X2]} 

=  l/(n-l)[{(n-l)/n)i:iXi2.(n-l)X2] 

=  Y[  (Xi'  -  X)2  /  n. 
Call  this  (Tj  ^.  Now  suppose  it  is  known  that  X  ^NCji.CT^)-  this  restriction  is  not  really 
required  in  this  context  -  and  it  is  desired  to  estimate  the  variance  of  X  using  the 
bootstrap  method.  As  shown  in  the  previous  chapter, 


. — *. 


E(X)  =  ]i^  ,  (3.12) 

SO  the  unconditional  expected  value  of  j^S      is: 

"E(^S*2)  =  E4E(,S*2|X)]  (3.13) 

=  E[(5:(Xi  -  Xf  )l  n  ] 
=  ((n-l)/n)<7x2 

Then   .S  ^   is  a  biased  estimator  for  G„^.  The  finite  population  correction  factor  might 
thus  be  suggested  to  improve  the  performance  of  j^S  ^  .  Define 


* 

2 


S*2  =  (n/(n-  1))  ,S*2  =  n/(n-l)2  J]-  (X^*  -  X*)^  (3.14) 


an  unbiased  bootstrap  estimator  of  (T^^  .  Analyzing  expression  (2,5)  and  (3.11),  yet 
another  estimator  for  a^^  can  be  suggested.  Since  the  value  of  E*(X-  )=X  is  known, 
the  following  estimator  for  <7^  also  seems  reasonable: 


* 


jS"'  =  I  (X' i  -  X)2  /  n  (3.15) 

The  third  experiment  was  conducted  to  compare  the  performance  of  these  three 
estimators  (3.4),  (3.14),  and  (3.15).  Figures  3.9,  3.10,  and  3.11  show  the  results  of  this 
experiment. 

As  can  be  seen,  the  third  estimator,  ,8  ^,  in  almost  all  cases  outperforms  the 
other  two  for  all  different  sample  sizes  tried  in  this  experiment.  Even  the  second 
estimator  (3.14)  performs  almost  as  good  as  ^S  ^  when  n  >   50.  When  n  >   50,  the 


29 


in    d 


— • —  N=.5 
— « —  N-tO 
_-♦._   N=I5 

— —  N=20 
— «—  N-25 
— •--  N-30 
— —  N-50 
•—  N=60 


--». ...^.. 


-r-        !       •-!  f :  ; 1- -j 

1 I I L 


20  40 

B  =  BOOSTRAP  REPLICATIONS 


60 


Figure  3.9     MSE**^  of  the  Sample  Variance  of  a  N(0,1). 


a:   ■* 

g  ° 
a: 


5 

O 
U1 


UJ    r^ 
2    d 


. —   N-5 

— ♦—  N=10 
— »—    N-15 


_-»._  N-20 
— »....  N-25 
— * —  N-JO 

k —  N-50 

*        N-60 


-* ♦ 


r.*.— - 


'^^-.^■~T=r^.^ 


_L 


_L 


20  40 

B  =  BOOSTRAP  REPLICATIONS 


60 


Figure  3.10     MSE.-*''  of  the  2nd  Variance  Estimator  of  a  N(0,1). 
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Figure  3.11     MSE*"^  of  the  3rd  Variance  Estimator  of  a  N(0,1)- 


difference  between  these  three  difTerent  estimators  is  barely  noticeable.  However,  for 
very  small  samples,  n  <  20,  ^S  ^  is  definitly  a  better  estimator  for  cr^  than  j^S  ^  . 
Efron  [Ref,  1]  has  suggested  the  use  of  ^^S  ^  as  the  bootstrap  estimator  of  the  sample 
variance.  As  the  plots  suggest,  it  could  be  now  recommended  the  use  of  ^S  and 
even  ^S  ^  (for  larger  samples,  n  >  50)  rather  than  ^^S  ^  to  estimate  the  sample 
variance.  Note  that  as  n->oo  S  ^  is  the  same  as  2^  ^  •  (Note:  these  two  estimators 
(3.14)  and  (3.15)  are  called  VARIA2  and  VAR1A3  respectively  in  the  FORTRAN 
code,  listed  in  Appendix  A). 

D.      THE  CENTER  OF  A  DISTRIBUTION:  COMPARISON  OF  THE  MEAN, 
MEDIAN  AND  TRIMMED  MEAN 

The  sample  mean  is  the  most  used  estimator  for  the  center  of  a  distribution. 

However,  two  other  estimators  are  also  used,  specially  for  symmetric  distributions:  the 

median   and  the   5%    trimmed   mean.   There   have   been  many   comparisons   of  the 

asymptotic  performance  of  these  three  estimators.  Lehman  [Ref  8]  has  calculated  the 

asymptotic  values  of  these  estimators  in  case  when  the  sample  is  from  a  Normal(0,l)  or 

a  Laplace(0,l)  population.   These  calculations  are  summarized  in  Table  1  below. 
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TABLE  1 

ASYMPTOTIC  VARIANCE  OF  THE  MEAN,  MEDIAN 
AND  5%  TRIMMED  MEAN 

ESTIMATOR. 

Probability 
Distribution 

Mean 

Median               5%  Trimmed  Mean 

Normal(0,l) 
Laplace(0,l) 

1.0/n 
2.0/n 

1.57/n                 1.01/n 
1.00/n                 1.65/n 

These  values,  among  other  things,  show  that  for  the  case  of  sample  coming  from  a 
Normal(0,l),  the  mean  has  less  asymptotic  variance  than  the  other  estimators. 
However,  if  the  data  comes  from  a  population  with  heavy  tails,  like  the  Laplace,  the 
median  is  a  better  estimator  asymptotically  (having  less  variance).  The  5%  trimmed 
mean  is  a  compromise  between  the  other  two:  it  should  used  when  the  practitioner 
does  not  know  the  nature  of  the  tails  of  the  population. 

A  fourth  experiment  was  conducted  to  see  if  these  observations  hold  when  the 
corresponding  bootstrap  estimators  are  used.  In  this  experiment,  the  MSE  of  of  the 
bootstrap  estimators  were  compared  with  the  asymptotic  MSE  for  the  usual  estimators 
as  B  increases.  The  asymptotic  MSE  (call  it  MSE^)  of  the  three  estimators  could  be 
estimated  by  adding  the  asymptotic  variance,  as  defined  in  Table  1,  plus  the 
bias-squared.  The  MSE^  was  compared  with  the  MSE**^  of  the  bootstrap 
estimators,  for  several  sample  sizes,  as  B  increases. 

Figures  3.12,  3.13,  and  3.14  summarize  the  results  of  this  comparison  for  the  case 
of  a  Normal(0,l)  population.  Figures  3.15,  3.16,  and  3.17  show  the  results  for  a 
Laplace(0,l)  population. 

In  these  figures,  the  solid  horizontal  lines  represent  the  values  of  the  asymptotic 
MSE  of  the  usual  estimators.  For  example,  in  Figure  3.12  the  estimated  asymptotic 
MSE  of  the  sample  mean  for  a  sample  of  size  n=5  is  approximately  1/5.0  + 
(BIAS)^~.20.  The  dotted  line  represents  the  estimated  MSE  of  the  bootstraped 
estimators  as  B  increases. 

In  summary,  for  the  Normal(0,l)  population,  the  bootstraped  sample  mean  and 
the  5%  trimmed  mean  have  less  error,  asymptotically;  they  are  estimating  the  center  of 
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Figure  3.12    Asymptotic  iMSE  of  the  Sample  Mean  of  a  N(0,1). 
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Figure  3.13    Asymptotic  MSE  of  the  Sample  Median  of  a  N(0,1). 
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Figure  3.14     Asvmptotic  MSE  of  the  Sample 
5%  Trimmed  Mean  of  a  N(0,1). 
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Figure  3.15    Asymptotic  MSE  of  the  Sample  Mean  of  a  L(0,1). 
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Figure  3.16    Asymptotic  MSE  of  the  Sample  Median  of  a  L(0,1). 
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Figure  3.17    Asymptotic  MSE  of  the  Sample 
5%  Trimmed  Mean  of  a  L(0,1). 
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the  distribution  with  much  better  precision  than  the  bootstrap  sample  median. 
Comparing  Figures  3.12  and  3.13,  it  looks  obvious  that  for  sample  sizes  n<60  the 
bootstraped  sample  mean  shows  much  smaller  MSE  than  the  bootstraped  sample 
median.  When  the  sample  size  is  n=60  there  is  no  distinguishable  diflerence  between 
the  estimated  MSE's  of  these  two  estimators.  Notice  that  the  bootstraped  5%  trimmed 
mean  (Figure  3.14)  seems  to  perform  as  well  as  the  bootstraped  sample  mean;  it  is 
better  for  very  small  samples,  say  for  n=5,  10,  and  15.  This  confirms  the  general 
relationship  among  these  estimators,  even  in  the  case  of  bootstraping  the  estimators, 
that  the  5%  trimmed  mean  is  a  robust  compromise  between  the  sample  mean  and  the 
sample  median. 

The  results  obtained  in  this  experiment,  however,  do  not  agree  with  the  classical 
theory  in  the  case  of  the  Laplace  population.  In  this  case  the  bootstraped  sample  mean 
outperforms  the  bootstraped  sample  median  in  estimating  the  center  of  the 
distribution,  for  sample  size  n  ^  20.  For  a  sample  of  size  n  =  60,  there  is  no  real 
difference  between  these  two  estimators,  in  terms  of  MSE*"^.  Notice  that  the  5% 
trimmed  mean  (Figure  3.17)  performs  better  than  the  bootstraped  sample  median 
(Figure  3.16)  for  the  cases  where  n<60,  but  in  turn,  is  outperformed  by  the 
bootstraped  sample  mean  (  Figure  3.15). 

E.       LINEAR  REGRESSION  BY  BOOTSTRAPING  THE  RESIDUALS 

In  a  final  experiment,  linear  regression  estimation  was  considered.  In  this  case, 
there  is  a  choice  of  bootstraping  methods;  however,  in  this  thesis  only  one  method  is 
considered.  The  method  considered  here  relies  on  bootstraping  residuals  to  estimate 
the  variance  of  the  P*^  vector(P*^  stands  for  "  P  hat").  A  measure  to  estimate  the  MSE 
of  this  vector  is  also  introduced. 

In  the  typical  linear  regression  problem  there  are  n  independent  observations 
(real-valued)  Y-  and  it  is  assumed  that  the  following  model  holds: 

Y  =  Xp  +  e  ,  (3.16) 

where  c  is  a  random  sample  from  some  population  F,  and  P  is  a  p  x  i  vector  of 
unknown  parameters  that  must  be  estimated.  All  that  is  assumed  about  F  is  that  it  is 
centered  at  zero,  E(e)=  O  and  Cov(£)=(y^  I  .  One  way  of  estimating  P  is  by  the 
commonly  used  least  squares  method,  in  which  the  sum  of  the  squared  distances 
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between  the  y-  and  the  predicted  values  y-''  is  minimized.  When  this  fitting  technique  is 
used,  the  estimate  of  p""  is  obtained  by  choosing  the  P"^   such  that: 

5"  :  min  5:^  (yi  -  yi'^)^  (3.17) 

P 

Then,  as  is  well  known 

b^  =  (x'x)-^x'y  ,  (3.18) 

where  X'  stands  for  the  transpose  of  X  .    Also,  the  vector  €  can  be  estimated  by  the 
vector  of  residuals, 

e'  =  Yi-yi'  .  (3.19) 

It  is  desired  to  determine  the  precision  of  the  estimator  P**  .  The  bootstrap  method 

could  now  be  applied  to  estimate  the  variability  and  the  MSE  of  the  vector  P*"    by 

bootstraping  the  residuals.   [  As  a  remark,  the  second  method  discussed  by  Efron 

[Refs.  1,4:  Section  7.2,7],  considers  each  covariate  response  pair  Z-  =  (Yj  ,  Xj  )  to  be  a 

* 
single  da*a  point  obtained  in  the  p    x    i  space  by  sampling  from  F      randomly. 

Therefore,  this  method  does  not  condition  on  X  and  does  not  presuppose  that  the 

model  (3.16)  in  question  is  correct.  It  estimates  the  joint  distribution  of  Y  and  X-. 

Then,    the    algorithm   presented    in    Chapter   Two    could    be    used    to    estimate   the 

covariance  matrix  of  P**  ]. 

The  algorithm  for  bootstraping  explained  in  Chapter  Two,  Section  A. 2,  can  be 

used  as  follows: 

(1)  construct  F*^  ,  by  giving  mass  1/n  at  each  observed  residual  and  sample  F*'   to 
obtain  bootstrap  samples:   £•   ^jj^F  . 

(2)  construct  a  new  data  Y-  ,  call  this  the  bootstrap  data  set,  by  using  £  •  and  P** 


Y*      =  XP''  +  e*   .  (3.20) 
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(3)      Using  the  same  fitting  technique  used  to  obtain  P'^    in  the  original  problem, 
calculate  P     .  Then  obtain  an  estimate  of  P    : 

b*  =  (X'X)-^  x'y*  (3.21) 


(4)      Repeat    steps    (2)    and    (3)     B     times    obtaining    independent    bootstrap 

*       *  * 

realizations  h  ^,  b  ^,...,  b  g  .  Then  the  covariance  of  P     can  be  estimated  by 

the  sample  covariance  matrix  of  the  b  ^  ,  b=  1,  2,...,  B. 

Efron  has  shown  (See  [Ref  1:  page  18] )  that  as  B  ->  oo  , 

Var(p*)  =  ((n-p)/n)  (X'  Xy^c^  (3.22) 

where  <t^  is  an  unbiased  estimate  of  the  variance  of  Y-  .    In  this  procedure,   C7^   can  be 
estimated  by  ^S      .It  can  be  seen  that  as  B  -♦  oo  , 

Var(P*)  ->   Var(p*^ )  .  (3.23) 

The  following  experiment  was  conducted  to  estimate  the  MSE  of  P*^.  Suppose  it 
is  known  that  the  observations  Y-  come  from  a  Normal(0,l).  Then  the  true  value  of  the 
P"  vector  in  the  regression  model  (3.17)  is  P  =  (0,0,0),  so  the  E(P)  =  O  and  the 
variance-covariance  matrix  of  P  is  Ln  =  <y^(X'  X)"^  ,  where  it  is  known  that  O"^  =  1. 

For  this  experiment,  a  design  matrix  X  of  orthogonal-column  vectors  was 
created.  This  matrix  has  I's  in  the  first  column;  then  a  series  of  n  alternating  I's  and 
-I's  in  the  second  column;  and  finally  the  third  column  (for  p=  s  )  is  a  series  of  two  I's 
and  two  -I's  (also,  n  =  2^^  ,  x  =  2,  3,  4,...  ).  Then  it  was  possible  to  readily  calculate 
P\by 


P-^    =  (l/n)(X'Y).  (3.24) 

The  bootstrap  algorithm  described  above  was  used  to  generate  a  sample  of  P  j  .  Then, 
an  estimate  of  P-     is 

b*-  =  (1/n)  (X'  Y*)  .  (3.25) 


38 


It  was  desired  to  develop  a  measure  of  precision  for    P     analogous  to  MSE,  which 
depends  on  Var(P  )  and  the  bias  of  p     .  Define 


MSE(P)=  E[(p'  -E(p))2]. 


(3.26) 


Recall  that  in  this  experiment  the  E{\i^  )  =  O.    Then,  (3.26)  could  be  estimated  in  the 
following  way: 

1)     Do  step  (4),  as  above,  obtaining 


MSE*(p  )  =    Ei  (Pi     -E(P''))2]/B     .      i=l,2,...,  B  (3.27) 

=  GillPi*-Pll']/B   . 
2)     Repeat  (1)  a  number  of  M  times  to  obtain  an  average   MSE***  of  the  procedure 
(3.27). 
The  results  of  this  experiment  are  shown  in  Figure  3.18. 
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Figure  3.18     Estimated  Averages  MSE  of  P**. 
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Here,  the  sample  sizes  were  taken  as  n  =  4,  8,  16,  32,  64,  and  128,  and  M  =  15. 
The  estimator  p*  was  bootstraped  a  number  B  =  5,  10,  15,  20,  30,  40,  50,  100,  150, 
and  500.  The  results  obtained  were  surprising.  When  the  number  of  observations  is 
small,  n<33  ,  the  MSE**'  of  the  estimator  is  relatively  high  (MSE.***  >  .09)  even 
when  B  is  as  large  as  500.  When  n  >  65,  there  is  some  improvement  in  the  MSE*''  ;  in 
this  case,  the  MSE*''  is  at  least  5%  lower  that  when  to  n  <  33.  It  is  interesting  to  see 
that  increasing  B  from  5  to  500  there  is  no  remarkable  gain  in  the  precision  of 
estimator  when  n  >  65;  the  MSE***  oscillates  around  the  same  value.  Now,  when  n  < 
33,  increasing  B  by  the  same  amount,  the  MSE*''  decreases  but  less  than  1%  of  its 
initial  value.  It  seems  that  in  the  linear  regression  estimation  the  key  problem  is  the 
size  of  n  and  not  of  B. 

When  using  this  method  for  estimating  the  MSE  of  p'^  ,  the  practitioner  must 
bear  in  mind  that  it  involves  the  residual  distribution  and  hence  assumes  that  the  linear 
model  is  correct. 
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IV.  CONCLUSIONS 

As  it  has  been  shown,  the  Bootstrap  is  an  accurate  method  for  estimating  the 
precision  of  the  estimates  and  for  estimating  the  distribution  (or  some  feature  of  the 
distribution)  of  an  estimator.  For  MSE,  the  number  B  required  to  obtain  a  certain 
degree  of  accuracy  will  vary  depending  mainly  on  the  population  (this  is  a  subject  for 
further  studies)  and  the  type  of  the  estimator  used  for  estimation.  It  was  found  that 
when  the  sample  comes  from  a  population  having  heavy-long  tails,  such  as  the  Laplace 
distribution,  the  bootstrap  estimator  for  the  mean  is  a  better  estimator  for  estimating 
the  center  of  the  distribution  than  the  median  or  the  5%  trimmed  mean;  where  in  the 
case  of  using  nonbootstrap  estimators,  the  median  is  a  better  estimator  than  the  other 
two  estimators. 

In  estimating  the  variance  of  a  population,  it  was  found  that  there  exists  an 
estimator  that  is  more  accurate  than  the  typical  estimator  recommended  in  the 
bootstrap  literature.  This  estimator  (^S  ^)  relies  on  the  fact  that  the  original  sample 
mean  in  the  bootstrap  method  is  known.  Once  this  value  is  calculated,  there  is  no  need 
to  fmd  X  for  each  bootstrap  sample,  since  X  is  fixed  through  the  process.  Another 
estimator  for  cr^  was  also  proposed,  ^S  ^  .  This  estimator  is  unbiased,  where  ^^S  ^  is 
not,  but  for  small  sample  sizes,  n  <  30  ,  is  not  as  accurate  as  3S  .  It  should  be 
emphasized  that  in  using  this  estimator,  ^S  ^,  one  can  reduce  the  computer  time 
required  to  estimate  <r^.  Hence,  this  is  another  advantage  in  using  this  estimator. 

In  the  linear  regression  estimation,  using  as  a  measure  of  precision  definition 
(3.28),  it  was  found  that  the  bootstrap  method  analyzed  in  this  thesis  gives  estimates 
with  small  MSE**'  with  relative  small  sizes  of  B,  but  for  relatively  large  sample  size,  n 
>  60.  When  the  sample  size  is  small,  increasing  B  up  to  500  will  result  in  a  gain  of 
around  1%  in  the  precision  of  the  estimates.  Thus,  in  the  linear  regression  estimation 
the  critical  issue  for  MSE  is  the  sample  size.  It  was  also  noted  that  the  disadvantage  of 
this  method  is  that  it  assumes  that  the  model  in  question  is  correct. 

The  result  that  seems  to  apply  to  all  cases  studied  in  this  work  is  that,  in  using 
the  bootstrap  method  for  estimating  MSE  of  some  parameter  6  ,  there  really  exits  a 
tradeoff  between  B  and  n:  as  n  increases,  one  can  significantly  decrease  B  and  still  get 
very  precise  estimates.  However,  no  matter  what  n  is,  once  some  degree  of  accuracy 
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has  been  obtained,  there  is  no  reason  to  increase  B  much  more  since  this  will  not 
induce  greater  precision  in  the  estimates.  In  Appendix  C  ,  the  reader  will  find  tables 
that  provide  information  about  this  tradeoff  for  given  estimators  and  populations. 
Analyzing  the  figures  presented  in  previous  chapters  and  these  tables,  a  rule  of  thumb 
about  the  relation  between  n  and  B  can  be  hypothesized.  The  following  rule  seems 
reasonable:  make  the  number  B  ~  1000/n.  In  almost  all  cases  studied  here,  this  rule 
yielded  estimates  with  MSE***  <  0.05  (note:  independent  of  n,  making  40  <  B  > 
60  will  also  produces  estimates  with  small  MSE**'  ).  The  only  exception  is  when  the 
population  in  question  was  Laplace(0,l).  This  is  an  area  that  needs  further  study. 

Finally,  it  was  found  that  a  (possibly  not  serious)  disadvantage  in  using  the 
bootstrap  method  is  the  computer  time  required  to  obtain  the  estimates.  For  example, 
in  estimating  the  variance  of  a  Gamma(0.5,l)  distribution,  increasing  B  from  20  to  100 
increased  the  CPU  time  of  the  IBM  3033-A16  system  used  in  this  experiment  about 
75%.  This  time  is  increased  at  least  another  50%  if  one  desires  to  obtain  the 
distributional  characteristics  of  the  estimator  (i.e.,  boxplots).  However,  in  view  of  the 
decreasing  cost  of  computer  time,  this  does  not  seem  to  be  a  major  obstacle  for  using 
this  method. 
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APPENDIX  A 
LIST  OF  SPECIAL  NOTATIONS 

(1)  e''  :G  -hat,  estimator  of  0 

(2)  F''  :empirical  probability  distribution 

(3)  G  (F  )  :the  value  of  0  based  on  bootstrap  method 

* 

(4)  X  :a  bootstrap  random  sample 

(5)  MSE*''  estimated  MSE  based  on  bootstrap  method 

(6)  P''  estimator  of  the  p  x  i  jJ -vector 
(7)b''                      :an  estimate  of  p'' 

(8)  P  :estimator  of  P  based  on  bootstrap  method 

(9)  b  :an  estimate  of  P 
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APPENDIX  B 
FORTRAN  CODE  FOR  BOOTSTRAPING 

This  program,  called  BOOTST,  was  developed  to  estimate  distributional 
properties  of  some  statistical  estimators  using  the  Bootstrap  Method.  Also  it  is  possible 
to  obtain  estimates  of  the  MSE  of  the  estimators.  The  code  was  written  in  FORTRAN 
77.  It  can  generate  a  random  sample  for  Monte  Carlo  simulation  or  can  read  the 
sample  data  by  a  CALL  to  a  subroutine  FDATA  (at  the  end  of  the  code  listed  below). 
The  user  can  generate  samples  from  the  following  distributions:  Exponential(X), 
Laplace(0,l),  Uniform(0,l),  Normal(0,l),  Gamma(a,l),  Poisson(X,),  and  the 
Geometric(p).  The  parameters  a,  X,  and  p  can  be  specified  by  the  user  within  the 
appropriate  function.  With  this  program,  the  user  can  study  the  distributional 
properties  of  the  following  bootstrap  estimators:  mean,  variance  coefficient  of 
variation,  serial  correlation,  median,  and  the  5%-trimmed  mean.  Also,  one  can  obtain 
estimates  of  the  "P  -vector"  in  the  case  of  the  linear  regression  estimation  by 
bootstraping  the  residuals  (  See  Chapter  Three,  Section  D  ).  The  program  is  structured 
in  five  main  sections:  the  MAIN  program,  to  include  input  requirements;  the  DATA 
GENERATION,  the  ESTIMATORS  definition,  the  BOOTSTRAP  SAMPLING 
mechanism,  and  the  STATISTICS  sections. 

The  program  can  be  used  in  two  ways.  The  first,  makes  use  of  another  program 
called  SMTBIO.  This  code  was  developed  at  the  NPGS  by  Prof  P.A.W.  Lewis,  and 
Mr.  Luis  Uribe  (See  [Ref  9]  ).  It  is  highly  recommended  that  the  user  become  familiar 
with  the  documentation  of  STMBIO  before  attempting  to  use  BOOTST.  In  general, 
when  using  this  option,  the  user  must  create  an  input  file  containing  the  parameters 
specified  in  the  input  section  of  BOOTST.  Then,  a  CALL  is  made  to  STMBIO,  and  in 
turn  STMBIO  will  make  various  sequential  calls  to  generate  the  data,  calculate  the 
values  of  the  desire  estimators  (using  the  bootstrap  mechanism),  and  produce  the 
statistics.  When  a  call  to  STMBIO  is  made,  the  user  could  produce  estimates  for  1,  2, 
or  3  different  estimators  using  1,  2,  or  3  sample  data  generators  or  any  of  the  eight 
possible  combinations.  Also,  the  user  could  select  up  to  8  different  sample  sizes  for 
each  estimator.  Therefore,  in  one  execution,  statistics  for  up  to  three  difierent 
estimators,  using  up  to  three  different  data  generators,  and  for  up  to  eight  difierent 
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sample  sizes  can  be  obtained  using  the  bootstrap  method.  These  options  are  controlled 
in  the  INPUT  requirements  of  BOOTST.  At  the  end  of  each  execution,  BOOTST  will 
send  to  a  printer  (or  to  the  screen,  depending  on  the  option  selected)  a  file  containing 
boxplots  and  a  summary  of  the  statistics  for  each  estimator.  The  input  requirements 
are  controlled  by  the  user  in  a  file  called  BOSIN. 

The  general  execution  of  BOOTST  runs  as  follows: 

(1)  For  each  estimator 

(2)  Read  Input  Requirements  {MAIN) 

(3)  CALL  STMBIO 

(4)  CALL  Data  Generator  {Data  Generation  Section) 

(5)  A'^=  k'x  n  random  variates  are  generated,  where  k=  1  or  2,..., 
or  8  different  sample  sizes.  Then  the  data  is  sectioned  into 
samples  of  sizes  N{K)=  n.  IfM  repetitions  of  the  process  are 
allowed,  then  a  total  of  M'>^  N  random  numbers  are  obtained. 
Estimates  are  calculated  for  each  sample  size  N{K). 

(6)  CALL  Estimator  Function  {Estimator  Section) 

Begin  Generation  of  Estimates 

(7)  For  1=  J  to  B 

CALL  BOOTSTRAP  {Bootstrap  Section) 

CALL  STATISTIC 

Store  Bootstrap  Estimates 
CALL  STATISTIC 

Store  Mean  of  Bootstrap  Estimates 

(8)  PRODUCE  Boxplot  and  Statistics 

The  input  requirements  specific  to  BOOTST  are  explained  below,  the  other 
inputs  declared  in  the  MAIN  are  specific  to  STMBIO  (  See  [Ref  fcreflO]  ). 

(1)  ANS  :  1  or  0    :  If  the  user  wants  to  store  each  bootstrap  estimate  for  each 
estimator,  the  answer  should  be  1.  Estimates  are  stored  in  FILE  21. 

(2)  NE(I):  a  vector  containing  the  sample  sizes  (n).  Up  to  8  different  sample  sizes. 

(3)  IB:  Number  of  bootstrap  replications  for  each  execution. 

(4)  IX:  Seeds  used  to  generate  data  (up  to  3  different  seeds). 

If  the  user  desires  to  obtain  estimates  and  graphical  displays  of  two  or  more 
different  estimators  and  is  using  a  large  number  B,  say  B  ^  60,  the  amount  of 
computer  time  required  will  increase  significantly  depending  on  the  system  used. 

The  second  way  to  execute  BOOTST  is  recommended  for  more  experienced  users 
or  for  those  who  do  not  want  to  obtain  boxplots  of  the  estimates.  This  option  will  save 
a  great  deal  of  CPU  time.  For  this  option,  the  user  will  have  to  make  some  simple 
changes  to  the  MAIN  program: 

(1)  Delete  from  the  input  requirement  section  those  inputs  that  only  apply  to 
STMBIO  (those  not  listed  above).  ^ 

(2)  Replace  the  call  to  STMBIO  by  the  following  sequence  of  calls: 
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i)  Call  Data  Generator  (i.e.,  one  of  the  data  generators) 
vii)  Call  Estimator  (i.e.,   one  of  the  estimator  functions)  The  estimator 

function  (subroutine)  will  make  the  appropriate  call  to  the  Bootstrap  and 

Statistic  subroutines. 


(3)  For  this  option,  the  input  parameters  ANS  must  be  set  to  integer  1.  Also,  if 
the  user  now  make  reference  to  the  code,  it  will  be  noticed  that  each  estimator 
subroutine  has  a  special  parameter  WI.  This  parameter  must  be  deleted 
everywhere  since  its  only  apphes  to  STMBIO. 

The  computer  code  is  listed  below. 

C    UPDATED     07-03-86   W.  CORTES-COLON 

C    MAIN  :  DECLARATION,  INPUT  SECTION  AND  CALL  FOR  SMTBIO. 


C 


COMMON  18,1X1,1X2, 1X3, IX^, ANS 

COMMON  Z( 20000  ) 

CHARACTER*80  Tl,  T2,  T3 

REAL*^   Y(10000),YMIN,YMAX,  PMEAN( 3  )2AMSEC( 3  ) 

INTEGER   NE(8),D,RG,SEI,SVS,N,M,L,NEST,NSR 

INTEGER   1X1,1X2, 1X3, 1X4, IB.ANS 

EXTERNAL  XMEAN,VARIA ,CGEVA,SECOR, MEDIA, TRIMM,VARI2,VARI3,BLREG 

EXTERNAL  EXPON,UNIFO,NORML,  GAMAF,  POISF,  GE0MF,LAPLA 


0PEN(UNIT=19,  FILE=*BOSIN' ) 

READ(19,*)  ANS 
10  READ(19,*,  END=999)  N,M,L,D,RG,SEI ,SVS,NEST,NSR 

READ( 19,*)  YMIN,  YMAX 

READ(19,»)  (NE(I),I=1,L) 

READ! 19,*)  IB 

WRITE(2^,105)  IB,(NE(I),I=1,L) 
IDS  FORMAT! 14,81^) 

READ(19,*)  1X1,1X2,1X3,1X4 

READ( 19,115)  Tl 
115  FORMAT! A80) 

READ! 19,115)  T2 

READ! 19,115)  T3 

READ!19,*)  (PMEAN(I),I=1,3) 

READ(19,*)  !AMSECIJ),J=1,3) 
C    CALL  FOR  SMTBIO:  PRODUCES  BOX-PLOT  AND  COMPARISON  OF  STATISTICS 

CALL  SMTBIO! 1X1,1X2, 1X3, Y,N,M,NE,L,D,NSR,RG,SEI,SVS,YMIN,YMAX, 
»   NEST,  N0RML,XNEAN,T1,N0RML, MEDIA, T2,N0RML,TRIMM,  T3, 
»   PMEAN.AMSEC) 
GO  TO  10 
999  WRITE! 6,*)  'END  OF  DATA  INPUT' 
STOP 
END 
C    DATA  GENERATION  SECTION 


C 


C 


SUBROUTINE  EXPONC IX,X,NEK) 

REAL  X!l) 

IF! NEK  .LE.  0)  RETURN 

CALL  SEXPN(IX,X,NEK,1,0) 

RETURN 

END 

SUBROUTINE  LAPLAC IX,X,NEK ) 

INTEGER  ISEED 

REAL  XI 1  ),XUI 1000), X2I 1000) 

IFINEK.LE.O)  RETURN 

CALL  SEXPNIIX,X2,NEK,1,0) 

CALL  SEXPN(IX,XU,NEK,1,0) 

DO  10  1=1, NEK 

XII)=X2(I)-XU(I) 
10  CONTINUE 
RETURN 
END 

SUBROUTINE  U»,'IFO(  IX,X,NEK ) 

REAL  XI  1) 

IFCNEK  .LE,  0)  RETURN 

CALL  SRND(IX,X,NEK,1,0) 

RETURN 

END 

SUBROUTINE  NORML! IX,X,NEK ) 

REAL  XII) 

IF! NEK  .LE.  0)  RETURN 

CALL  SN0R!IX,X,NEK,1,0) 

RETURN 

END 

SUBROUTINE  GAMAF! IX,X, NEK ) 

REAL  X!l),  ALPHA 

ALPHA=0.5 

IF! NEK  .LE.  0)  RETURN 

CALL  SGAMA(IX,X, NEK, 1,0, ALFA) 

RETURN 

END 

SUBROUTINE  POISF! IX, X, NEK ) 

REAL  XI 1),LAMDA 

LAMDA=0.5 

IF! NEK  .LE.  0)  RETURN 

CALL  SPOISIIX,X,NEK,1,0,LAMDA) 
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RETURN 
END 
C 

SUBROUTINE  GEOMF( IX,X,NEK ) 
REAL  X(l),  P 
P=0.5 

IF(NEK  .LE.  0)  RETURN 
CALL  SGE0M(IX,X,NEK,1,0,P) 
RETURN 
END 
C 

C     ESTIMATOR  SECTION  :  BRLG  IS  USED  FOR  LINEAR  REGRESSION  ESTIMATION 
C     ONLY.  IT  IS  RECOMMENDED  TO  USE  THIS  ESTIMATOR  SEPARETLY:  I.E, 
C     WHEN  CALLING  SMTBIO,  USE  ONLY  ONE  ESTIMATOR. 
C 

REAL  FUNCTION  BLREG( YOBS, NEK ,WI ) 
COMMON  IB.ANS 

REAL   Y03$( 1),BMSTAR(3),MSEBS 

REAL   XDESK  600,3  )  ,XTRANS( 3 ,600  )  ,XDES2( 3 ,600  )  ,XTXINV( 3 ,3  ) 
REAL   RESK  600  )  ,YHAT( 600  )  ,RSTAR( 600  ),BHAT( 3  )  ,YSTAR( 600  ) 
REAL   BSTAR(3J 
INTEGER   WI 
DO  10  1=1, NEK 
YHAT(I)=0.0 
DO  10  J=l,3 

XDES1(I,J)  =  1.0 
XDES2( J,I)  =  0.0 
XTRANS(J,I)=0.0 
10  CONTINUE 

DO  20  I=1,NEK,2 
XDES1(I,2)=-1.0 
20  CONTINUE 

DO  30  I=l,NEK,'t 

XDES1(I,3)  =  -1.0 
XDES1(I+1,3)  =  -1.0 
30  CONTINUE 
DO  40  1=1,3 

XTXINVl 1 .1  )=1 . 0/FLOAT( NEK ) 
BHAT(I)=0.0 
40  CONTINUE 

DO  50  J=1,NEK 
DO  50  1=1,3 

XTRANS( I , J  )=XDES1( J ,1 ) 
50  CONTINUE 

DO  60  K=l,3 

DO  60  J=1,NEK 
DO  60  1=1,3 

XDES2(K,J)=XDES2(K,J)  +  XTXINV(K,I )*XTRANS( I ,J  ) 
60  CONTINUE 
DO  70  K=l,3 

DO  70  J=1,NEK 

BHAT(K)=BHAT(K)  +  XDES2( K,J )*YOBS( J  ) 
70     CONTINUE 
DO  90  J=1,NEK 
DO  80  1=1,3 

YHAT( J)=YHAT(J)  +  XDESK J,I )*BHAT( I ) 
80     CONTINUE 

RESK  J  )=YOBS(  J  )-YHAT(  J  ) 
90  CONTINUE 

DO  95  IWX=1,3 

BMSTAR(IWX)=0.0 
95  CONTINUE 
MSEBS=0.0 
DO  100  IN=1,IB 

DO  110  JI=1,NEK 

RSTAR(  JI)=RESKJI) 
110         CONTINUE 

CALL  BOOTS(RSTAR,NEK) 
DO  120  K=1,NEK 

YSTARtK)=YHAT(K)  +  RSTAR(K) 
120         CONTINUE 

DO  130  K=l,3 
BSTAR(K)=0.0 
DO  130  KI=1,NEK 

BSTAR(K)=BHAT(K)  +  XDES2( K,KI )*RSTAR(KI) 

C    ^        WRITE(6,5)  (BSTAR(KL),KL=1,3) 
C    5         F0RMAT(3F8.4) 
DO  140  KJ=1,3 

„   BMSTAR(KJ)=BMSTAR(KJ)  +  BSTAR(KJ) 
140         CONTINUE 
100  CONTINUE 

DO  150  KH=1.3 
,^^  ^   BMSTAR(KH)=BMSTAR(KH)/FLOAT(IB) 
150  CONTINUE 

DO  160  KI=1,3 

MSEBS=MSEBS+  BMSTARC KI  )*BMSTAR(KI ) 

160  CONTINUE 

BLREG=MSEBS 
102  F0RMAT(Fi-4J*'^°-"^-^''-^'  WRITE (  21 ,102  )  BLREG 
RETURN 
END 
C 

REAL  FUNCTION  XMEAN( X,NEK,WI ) 

COMMON  IB,ANS 

REAL  X(  1 )  ,Y(  1000  )  ,  VdO  )  ,BB(  1000  ) 

DO  10  1=1, NEK 
Y(I)=X(I) 
10  CONTINUE 

DO  15  1  =  1, IB 

DO  20  JI=1,NEK 
X(JI)=Y(JI) 
20     CONTINUE 

CALL  BOOTS(X,NEK) 
CALL  BSTATS(X,NEK,V) 
BB(I)=  V(l) 
15  CONTINUE 

CALL  BSTATS(BB,IB,V) 
XMEAN=V(1) 
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102  FORMAT!  F8'^)*^°'"^'^"'*^'  WRITEC  21,102  )  XMEAN 
RETURN 
END 

REAL  FUNCTION  VARIA( X,NEK,WI ) 

COMMON  IB,ANS 

REAL  X(l),  Y(1000),V(10),BB(1000) 

DO  10  1=1, NEK 
Y(I)=X(I) 
10  CONTINUE 

DO  15  1=1, IB 

DO  20  JI=1,NEK 
X(JI)=Y(JI) 
20     CONTINUE 

CALL  BOOTSIX,NEK) 
CALL  BSTATS(X,NEK,V) 
BB(I)=  V(2) 
IS  CONTINUE 

CALL  BSTATS(BB,IB,V) 
VARIA=V( 1) 

IF(ANS.EQ.l.AND.WI.EQ.l)  WRITE( 21,102 )  VARIA 
102  F0RMAT(F8.^) 
RETURN 
END 

REAL  FUNCTION  VARI2( X,NEK,WI ) 
COMMON  IB,  ANS 

REAL  XdJ,  Y(  1000  ),V(  10  ),BB(  1000) 
INTEGER  WI 
DO  10  1=1, NEK 
Y(I)=X(I) 
10  CONTINUE 

DO  15  1=1, IB 

DO  20  JI=1,NEK 
X(JI)=Y(JI) 
20    CONTINUE 

CALL  BOOTS(X,NEK) 
CALL  BSTATS(X,NEK,V) 
BB(I)=  V{3) 
15  CONTINUE 

CALL  BSTATS(BB,IB,V) 
VARI2=Vll) 

IF(ANS.EQ.l.AND.HI.EQ.l)  WRITE( 21,102  )  VARI2 
102  FORMATCFB.'t) 
RETURN 
END 

REAL  FUNCTION  VARI3(X,NEK,WI  ) 
COMMON  IB,  ANS 

REAL  X(l),  Y( 1000  ),V( 10), BBC  1000  ),SMEAN,DNEK 
INTEGER  WI 
DNEK=NEK 
SMEAN=0.0 
DO  10  1=1, NEK 
Y<I)=X(I  ) 
SMEAN=SMEAN+X(I) 
10  CONTINUE 

SMEAN=SMEAN/DNEK 
DO  15  1  =  1, IB 

DO  20  JI=1,DNEK 
X(JI)=Y(JI) 
20    CONTINUE 

CALL  BOOTS! XjNEK) 
DO  30  JJ=1,NEK 

BB(I)=  BB(I)  +  ((X(JJ)-SMEAN)**2) 
30     CONTINUE 

BB(I)=BB(I)/DNEK 
15  CONTINUE 

CALL  BSTATS(BB,IB,V) 
VARI3=V( 1) 

IF(ANS.EQ.l.AND.WI.EQ.l)  WRITE! 21,102 )  VARI3 
102  FORMAT! F8.*) 
RETURN 
END 

REAL  FUNCTION  COEVA! X,NEK,WI ) 
COMMON  IB, ANS 

REAL  X!l),  YI 1000  ),V( 10  ),BB( 1000) 
INTEGER  WI 
DO  10  1=1, NEK 
Y!I)=X(I) 
10  CONTINUE 

DO  15  1  =  1, IB 

DO  20  JI=1,NEK 
X!  JI)=Y(JI) 
20     CONTINUE 

CALL  BOOTS! X, NEK) 
CALL  BSTATS!X,NEK,V) 
BB!I)=  VCt) 
15  CONTINUE 

CALL  BSTATS!BB,IB,V) 
COEVA=V( 1) 

IF( ANS.EQ.l.AND.WI.EQ.l)  WRITE! 21,102 )  COEVA 
102  FORMAT!  FS.'t) 
RETURN 
END 

REAL  FUNCTION  SECORI X,NEK,WI ) 
COMMON  IB, ANS 

REAL  X!l),  Y(1000),VI10),BB!1000) 
INTEGER  WI 
DO  10  1=1, NEK 
Yd  )=X(I) 
10  CONTINUE 

DO  15  1=1, IB 

DO  20  JI=1,NEK 
X(JI)=Y!JI) 
20    CONTINUE 
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CALL  BOOTS(X,NEK) 

CALL  BSTATS(X,NEK,V) 

BB(I  )=  V(5) 
15  CONTINUE 

CALL  BSTATS(BB,IB,V) 
SECOR=V( 1  ) 

IF(ANS.EQ.l.AND.WI.EQ.l)  WRITE( 21,102  )  SECOR 
102  FORMAT!  F8.<*) 
RETURN 
END 

^  REAL  FUNCTION  MEDIA(X,NEK,HI) 

COMMON  IB,ANS 

REAL  X(l),  Y( 1000  ),V( 10), BBC  1000) 
INTEGER  WI 
DO  10  1=1, NEK 
Y(I)=X(I) 
10  CONTINUE 

DO  15  1  =  1, IB 

DO  20  JI=1,NEK 
X(JI)=Y(JI) 
20     CONTINUE 

CALL  BOOTS(X,NEK) 
CALL  BSTATS(X,NEK,V) 
BB(I)=  V(6) 
15  CONTINUE 

CALL  BSTATS(BB,IB,V) 
MEDIA=V( 1) 

IF(ANS.EG).l.AND.WI.EQ.l)  WRITEC  21,102  )  MEDIA 
102  F0RMAT(F8.^) 
RETURN 
END 
C 

REAL  FUNCTION  TRIMM( X,NEK,WI ) 
COMMON  IB,ANS 

REAL  X(l),  Y( 1000  ),V( 10  ),BB( 1000) 
INTEGER  WI 
DO  10  I=1;NEK 
Yd  )=X(I) 
10  CONTINUE 

DO  15  1=1, IB 

DO  20  JI=1,NEK 
X(  JI)=Y(  JI) 
20     CONTINUE 

CALL  BOOTS(X,NEK) 
CALL  BSTATSIX,NEK,V) 
BBC  I  )=  VC7) 
15  CONTINUE 

CALL  BSTATSCBB,IB,V) 
TRiMM=VC 1) 

IFCANS.EQ.l.AND.WI.EQ.l)  WRITEC 21,102 )  TRIMM 
102  FORMATCFB.^) 
RETURN 
END 
C 

C    BOOTSTRAP       SECTION 
C 

SUBROUTINE  BOOTSCX.NEK) 
COMMON  1X4 

REAL  XCl),  XBCIOOO),  XXCIOOO) 
CALL  SRND(IX,XB,NEK,2,0) 
DO  10  1=1, NEK 
A=XB(I) 
B=  A*NEK 
M=INTCB+1) 
IFCM.GT.NEK)M=NEK 
XXCI  )=XCM) 
10  CONTINUE 

DO  20  1=1, NEK 
XCI)=XXCI) 
20  CONTINUE 
RETURN 
END 
C 

C    STATISTICS      SECTION 
C 

SUBROUTINE  BSTATSC X,NEK,V ) 

COMMON  IB 

REAL  XCl),  VCIO).  ZWC 5000  ), ZTC 5000 ),R,BMDIAN 

REAL*8  XMEAN,SUM2,SUM3,SUM4,SUUM4,DEV,DVAR,VSTD,DNB,SC0R 

REALMS  XTRIM,BTRIM,VARIA2 

INTEGER  BTAICeTAIL 

C  COMPUTE  MEAN,  STND  DEVIATION,  SKEWNESS,  KURTOSIS,  VARIANCE,  CV 

C MEDIAN,  CORRELATION  COEFF,  AND  TRIMC.05)  MEAN. 

NB=NEK 

IFCNB.GT.  1)  GO  TO  10 
WRITEC 6,100)  NB 
100   F0RMAT(2X,'SUBSAMPLE  SEIZE  IS  TOO  SMALL", F6. 2) 
RETURN 
10  CONTINUE 
XMEAN=0.0 
DN3=NB 
DO  20  1=1, NB 

XMEAN=XMEAN+XC I ) 
20  CONTINUE 

XMEAN=XMEAN/DNB 
VC 1)=XMEAN 
C    TO  GENERATE  HIGHER  MOMENTS 
SUM2  =  O.ODO 
SUMS  =  O.CDO 
SUM**  =  O.ODO 
DO  30  1=1, NB 

DEV  =  XC I )  -  XMEAN 
SUM2  =  SUM2  +  DEV  **  2 
SUMS  =  SUMS  +  DEV  ^*   3 
SUM"*  =  SUM4  +  DEV  **  4 
50  CONTINUE 
C    BOOtSTRAP  VARIANCE  AND  ITS  STANDARD  DEVIATION. 
DVAR  =  SUM2  /  CDNB  -  l.ODO) 
VC2J=DVAR 
VSTD=DSqRTCDVAR) 
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APPENDIX  C 
MSE*''   OF  SOME  ESTIMATORS  USING  THE  BOOTSTRAP  METHOD 


EST.  MSE 

Of  The 

Sample  M 

ean  Of  A 

n  1 

EXPCI) 

B/ 

n 

10 

20 

25 

40 

50 

70 

100 

140 

5 

0 

.1213 

0.0544 

0.0531 

0 

.0309 

0.0257 

0 

.0216 

0 

0142 

0 

0118 

8 

0 

.1157 

0.0570 

0.0446 

0 

.0299 

0.0277 

0 

.0164 

0 

0123 

0 

0103 

10 

0 

.1131 

0.0551 

0.0453 

0 

.0288 

0.0247 

0 

.0170 

0 

0134 

0 

0097 

15 

0 

.1095 

0.0543 

0.0451 

0 

.0277 

0.0241 

0 

.0164 

0 

0113 

0 

0099 

20 

0 

.1064 

0.0528 

0.0432 

0 

.0262 

0.0252 

0 

.0163 

0 

0131 

0 

.0096 

25 

0 

.1051 

0.0525 

0.0405 

0 

.0270 

0.0244 

0 

.0153 

0 

0132 

0 

0097 

40 

0 

.1022 

0.0508 

0.0417 

0 

0277 

0.0245 

0 

.0162 

0 

0122 

0 

.0087 

60 

0 

.1031 

0.0511 

0.0410 

0 

.0258 

0.0239 

0 

.0159 

0 

.0117 

0 

.0091 

100 

0 

.1030 

0.0512 

0.0420 

0 

0252 

0.0244 

0 

.0155 

0 

.0119 

0 

0090 

140 

0 

.1018 

0.0511 

0.0406 

0 

0256 

0.0242 

0 

.0156 

0 

0117 

0 

0092 

500 

0 

.1007 

0.0471 

0.0368 

0 

0217 

0.0202 

0 

.0119 

0 

0101 

0 

.0041 

EST.  MSE 

Of  The 

Sample.  V 

ariance 

Of 

An  EXPCI) 

5 

0 

.9130 

0.5313 

0.4114 

0 

1690 

0.1703 

0 

.1120 

0 

0745 

0 

.1363 

8 

0 

.7783 

0.4765 

0.4023 

0 

1951 

0.1538 

0 

.1176 

0 

0847 

0 

.0791 

10 

0 

.7776 

0.5418 

0.4485 

0 

1703 

0.1461 

0 

1393 

0 

0680 

0 

.0800 

15 

0 

.6732 

0.5385 

0.3457 

0 

1533 

0.1433 

0 

1096 

0 

0650 

0 

0817 

20 

0 

.6408 

0.4589 

0.3447 

0 

1562 

0.1373 

0 

1043 

0 

0662 

0 

0852 

25 

0 

7115 

0.4840 

0.3452 

0 

1730 

0.1311 

0 

0945 

0 

0656 

0 

.0887 

40 

0 

6822 

'0.4692 

0.3392 

0 

1556 

0.1349 

0 

1179 

0 

0635 

0 

0808 

60 

0 

6959 

0.4563 

0.3265 

0 

1529 

0.1341 

0 

1006 

0 

0658 

0 

.0827 

100 

0 

6857 

0.4668 

0.3434 

0 

1555 

0.1285 

0 

1185 

0 

0643 

0 

0753 

140 

0 

6789 

0.4714 

0.3259 

0 

1565 

0.1280 

0 

1069 

0 

0592 

0 

0733 

500 

0 

6649 

0.4603 

0.3035 

0 

1429 

0.1098 

0 

0937 

0 

0394 

0 

0563 

EST. 

MSE  Of 

The  Samp 

le 

Coeff 

.  of  Variation  Of  An  EXPCi: 

) 

5 

0 

0667 

0.0391 

0.0285 

0 

0238 

0.0183 

0 

0144 

0 

0090 

0 

0080 

8 

0 

0618 

0.0352 

0.0299 

0 

0249 

0.0160 

0 

0156 

0 

0079 

0 

0080 

10 

0 

0618 

0.0340 

0.0269 

0 

0218 

0.0169 

0 

0126 

0 

0084 

0 

0080 

15 

0 

0598 

0.0336 

0.0268 

0 

0221 

0.0158 

0 

0127 

0 

0076 

0 

0079 

20 

0 

0599 

0.0313 

0.0263 

0 

0218 

0.0156 

0 

0133 

0 

0077 

0 

0068 

25 

0 

0590 

0.0323 

0.0246 

0 

0223 

0.0156 

0 

0137 

0 

0079 

0 

0074 

40 

0 

0584 

0.0309 

0.0255 

0 

0208 

0.0153 

0 

0120 

0 

0073 

0 

0071 

60 

0 

0578 

0.0313 

0.0253 

0 

0214 

0.0154 

0 

0127 

0 

0078 

0 

0070 

100 

0 

0580 

0.0304 

0.0249 

0 

0213 

0.0151 

0 

0122 

0 

0070 

0 

0073 

140 

0 

0573 

0.0308 

0.0252 

0 

0215 

0.0147 

0 

0123 

0 

0074 

0 

0074 

500 

0 

0419 

0.0297 

0.0204 

0 

0187 

0.0115 

0 

0100 

0 

0057 

0 

0039 

Figured     MSE*''  of  the  Estimators  for  Exp(l). 
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Figure  C.3     MSE**'  of  ^S  ^   ,   ^S  ^  and  of  ^S'  ^ 
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